1- Explain the Data¶

The "Students Performance in Exams" dataset includes data on the performance of students in exams. It contains data for math, reading, and writing exams, as well as various background information on the students such as their gender, race/ethnicity, parental level of education, and lunch status (whether they receive a standard or reduced lunch). The data is presented in a table with each row representing a different student and each column representing a different attribute or feature. There are 1000 rows (students) in the dataset and 8 columns (features).

Here is a description of each column:

  • gender: Male or female.
  • race/ethnicity: This column has five categories: group A, group B, group C, group D, and group E.
  • parental level of education: This column has six categories: some high school, high school, some college, associate's degree, bachelor's degree, and master's degree.
  • lunch: Standard or reduced.
  • test preparation course: This column has two categories: completed or none.
  • math score: The student's score on the math exam.
  • reading score: The student's score on the reading exam.
  • writing score: The student's score on the writing exam.

2- Which variables you use for study and why?¶

  • gender: Male or female.
  • race/ethnicity: This column has five categories: group A, group B, group C, group D, and group E.
  • parental level of education: This column has six categories: some high school, high school, some college, associate's degree, - bachelor's degree, and master's degree.
  • lunch: Standard or reduced.
  • test preparation course: This column has two categories: completed or none.
  • math score: The student's score on the math exam.
  • reading score: The student's score on the reading exam.
  • writing score: The student's score on the writing exam` # - why
    1. How effective is the test preparation course?
    2. Which major factors contribute to test outcomes?
    • Answers:
    • 1. To determine the effectiveness of the test preparation course, you would need to perform some statistical analysis on the data. One way to do this would be to compare the exam scores of students who completed the test preparation course to those who did not. You could calculate the mean, median, and standard deviation of the scores for each group, and then use statistical tests to see if the differences between the groups are significant.

It's also important to consider other factors that may affect exam performance, such as the students' gender, race/ethnicity, parental level of education, and lunch status. You could control for these variables in your analysis by comparing only students who are similar in these regards (for example, comparing only male students or only students from group A).

It's worth noting that the "Students Performance in Exams" dataset is a relatively small dataset with only 1000 rows, so it may not be representative of the entire population of students. Additionally, the data may not include all relevant factors that could affect exam performance, so it's important to interpret the results of any analysis with caution.

  • 2. It's difficult to determine the exact factors that contribute to test outcomes without more information about the specific context in which the tests were taken. However, some general factors that could potentially affect test performance include:

  • The student's level of knowledge and understanding of the material being tested

  • The student's ability to apply that knowledge to solve problems or answer questions
  • The student's ability to recall information accurately
  • The student's level of motivation and engagement in the learning process
  • The student's level of stress or anxiety about the test
  • The student's overall physical and mental health
  • The student's access to resources, such as textbooks, tutors, or study groups
  • The student's study habits and test-taking strategies
  • It's also worth noting that external factors, such as the quality of the school or the teacher, the availability of support services, and the overall socio-economic environment, can also have an impact on test performance.

3- Draw appropriate graphs by using the libraries¶

Import the libraries¶

In [1]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
In [2]:
x = pd.read_csv("exams (1).csv")
In [3]:
x.head()
Out[3]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
0 male group A high school standard completed 67 67 63
1 female group D some high school free/reduced none 40 59 55
2 male group E some college free/reduced none 59 60 50
3 male group B high school standard none 77 78 68
4 male group E associate's degree standard completed 78 73 68
In [4]:
x.tail()
Out[4]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
995 male group C high school standard none 73 70 65
996 male group D associate's degree free/reduced completed 85 91 92
997 female group C some high school free/reduced none 32 35 41
998 female group C some college standard none 73 74 82
999 male group A some college standard completed 65 60 62

4- Countplot¶

In [5]:
#draw a count plot on our data to see the frequencey of each value in each colum
sns.countplot(x= x['gender'])
Out[5]:
<AxesSubplot:xlabel='gender', ylabel='count'>
In [6]:
#group countplot
sns.countplot(x= x['gender'], hue=x['parental level of education'])
Out[6]:
<AxesSubplot:xlabel='gender', ylabel='count'>
In [7]:
#group countplot
sns.countplot(x= x['gender'], hue=x['test preparation course'])
plt.title('Countplot for Exame performance',pad=20, fontsize=15)
Out[7]:
Text(0.5, 1.0, 'Countplot for Exame performance')
In [8]:
#group countplot
sns.countplot(x= x['gender'], hue=x['lunch'],saturation = 1, palette = 'colorblind')
Out[8]:
<AxesSubplot:xlabel='gender', ylabel='count'>
In [9]:
#group countplot
sns.countplot(y= x['gender'], hue=x['lunch'],saturation = 1, palette = 'Accent')
#saving a plot
plt.savefig('cont_plot.pdf')

5- Scatterplot¶

Scatter plot to plot with numerical values.¶

In [10]:
#how to draw a scatter plot
sns.scatterplot(data = x, x='math score',y='reading score')
Out[10]:
<AxesSubplot:xlabel='math score', ylabel='reading score'>
In [11]:
#how to draw a scatter plot
sns.scatterplot(data = x, x='math score',y='writing score',hue='lunch',palette=['Green','red'])
Out[11]:
<AxesSubplot:xlabel='math score', ylabel='writing score'>
In [12]:
#how to draw a scatter plot
sns.scatterplot(data = x, x='reading score',y='writing score',hue='gender')
Out[12]:
<AxesSubplot:xlabel='reading score', ylabel='writing score'>
In [13]:
#how to draw a scatter plot
sns.scatterplot(data = x, x='writing score',y='math score',hue='test preparation course',palette=['Green','darkviolet'])
#count plot to show the coordinates of the graph,points (x,y)
Out[13]:
<AxesSubplot:xlabel='writing score', ylabel='math score'>

using matplotlib library to create the graph¶

In [14]:
# make a scatter plot with matplotlib
plt.scatter(x['math score'], x['reading score'], marker='*' ,c=x['writing score']) #marker function use to change the icon,s
# adding labels
plt.xlabel("math score")
plt.ylabel("reading score")
plt.title("Scatterplot For math score Vs writing score")
plt.colorbar()
plt.show()
In [15]:
plt.scatter(x['gender'], x['reading score'], marker='*' ,c=x['writing score']) #marker function use to change the icon,s
# adding labels
plt.xlabel("math score")
plt.ylabel("reading score")
plt.title("Scatterplot For math score Vs writing score")
plt.colorbar()
plt.show()

6- Box plot¶

simple Boxplot are give the few info only telling the median and 1st,2nd,and 3rd Quantities.¶

In [16]:
#create a boxplot
sns.boxplot(data = x,y='math score',showmeans= True,)
Out[16]:
<AxesSubplot:ylabel='math score'>
In [17]:
# create a boxplot
sns.boxplot(data = x,y='writing score',palette='Dark2')
Out[17]:
<AxesSubplot:ylabel='writing score'>
In [18]:
# create a boxplot
sns.boxplot(data = x,y='reading score',palette='Set1')
Out[18]:
<AxesSubplot:ylabel='reading score'>

Box plot with show Mean line¶

In [19]:
# create a boxplot
sns.boxplot(data = x, x='gender',y='math score',showmeans= True,)
Out[19]:
<AxesSubplot:xlabel='gender', ylabel='math score'>

Boxplot with swarmplot¶

In [20]:
# create a boxplot
sns.boxplot(data = x, x='gender',y='math score',showmeans= True,)
sns.swarmplot(data = x, x='gender',y='math score', size=3, color='black')
Out[20]:
<AxesSubplot:xlabel='gender', ylabel='math score'>

The above boxplot use the "swarmplot" function to tell us the data are distributed to show the shape of the graph¶

7- Histogram plot¶

In [21]:
#histplot
sns.histplot(data=x, x='math score')
Out[21]:
<AxesSubplot:xlabel='math score', ylabel='Count'>
In [22]:
#histplot
sns.histplot(data=x, x='reading score', palette='Antique', binwidth=1)
Out[22]:
<AxesSubplot:xlabel='reading score', ylabel='Count'>
In [23]:
#histplot
sns.histplot(data=x, x='writing score', palette='o3')
Out[23]:
<AxesSubplot:xlabel='writing score', ylabel='Count'>

8- Boxen plot¶

Boxen are use to telling the data types of the variable and spread it.¶

In [24]:
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',kind='boxen')
Out[24]:
<seaborn.axisgrid.FacetGrid at 0x2308c1e9eb0>
In [25]:
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',hue='parental level of education',kind='boxen')
Out[25]:
<seaborn.axisgrid.FacetGrid at 0x2308c1f2520>
In [26]:
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',hue='race/ethnicity',kind='boxen')
Out[26]:
<seaborn.axisgrid.FacetGrid at 0x2308c26feb0>
In [27]:
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',hue='gender',kind='boxen')
Out[27]:
<seaborn.axisgrid.FacetGrid at 0x2308c1b2bb0>

9- Bar Plot¶

In [28]:
#how to make catagorical plots
sns.catplot(data = x, x='gender',y='reading score',hue='race/ethnicity', kind='bar', capsize=0.1) #capesize function to use for error bar with capes.
Out[28]:
<seaborn.axisgrid.FacetGrid at 0x2308c1f5ca0>
In [29]:
sns.catplot(data = x, x='gender',y='reading score',hue='lunch',kind='bar', capsize=0.1, ci=95)#use ci for confidence interval mean that 95% shows your result is right.
Out[29]:
<seaborn.axisgrid.FacetGrid at 0x2308bd427f0>
In [30]:
sns.catplot(data = x, x='gender',y='reading score',hue='lunch',kind='bar', capsize=0.1, ci=95, col='test preparation course')
#using col function to show the separate graphs on the bases of test preparation course.
Out[30]:
<seaborn.axisgrid.FacetGrid at 0x23086353a60>
In [31]:
sns.catplot(data = x, x='gender',y='reading score',hue='parental level of education',kind='bar', capsize=0.1)
Out[31]:
<seaborn.axisgrid.FacetGrid at 0x2308c27e0a0>

Violin Plot¶

In [32]:
#how ot create violin plot
#kind function to use for create different kind of plot
#hue function to use for compressions. 
sns.catplot(data = x, x='gender',y='reading score',hue='race/ethnicity',kind='violin')
Out[32]:
<seaborn.axisgrid.FacetGrid at 0x2308c26ffd0>
In [33]:
#how ot create violin plot
#kind function to use for create different kind of plot
sns.catplot(data = x, x='race/ethnicity',y='math score',hue='gender',kind='violin')
Out[33]:
<seaborn.axisgrid.FacetGrid at 0x2308bd5d7c0>
In [34]:
#how ot create violin plot
sns.catplot(data = x, x='lunch',y='writing score',hue='parental level of education',kind='violin')
Out[34]:
<seaborn.axisgrid.FacetGrid at 0x2308e2cb670>
In [35]:
#how ot create violin plot
#using col function to septate for those variables male or female.
sns.catplot(data = x, x='lunch',y='writing score',hue='parental level of education',kind='violin',col='gender')
Out[35]:
<seaborn.axisgrid.FacetGrid at 0x2308e258bb0>
In [36]:
#how ot create violin plot
#using col function to septate for those variables male or female.
sns.catplot(data = x, x='gender',y='reading score',hue='test preparation course',kind='violin',col='lunch')
Out[36]:
<seaborn.axisgrid.FacetGrid at 0x2308dc754f0>
In [37]:
# *Group violin plot*
sns.catplot(data=x, x= 'gender', y='math score', col='parental level of education', kind='violin')
Out[37]:
<seaborn.axisgrid.FacetGrid at 0x2308e33e460>

voilin plot with swarmplot¶

In [38]:
# how to create swarm plot
sns.catplot(data = x, x='gender',y='reading score',hue='test preparation course',kind='violin',col='lunch')
sns.swarmplot(data = x, x='gender',y='reading score',size=2)
Out[38]:
<AxesSubplot:title={'center':'lunch = free/reduced'}, xlabel='gender', ylabel='reading score'>

10- Histogram show with trend line¶

In [39]:
sns.histplot(data=x, x="math score", kde=True)
Out[39]:
<AxesSubplot:xlabel='math score', ylabel='Count'>
In [40]:
sns.histplot(data=x, x="reading score", kde=True)
Out[40]:
<AxesSubplot:xlabel='reading score', ylabel='Count'>
In [41]:
sns.histplot(data=x, x="writing score", kde=True)
Out[41]:
<AxesSubplot:xlabel='writing score', ylabel='Count'>

Line Plot¶

  1. line plot to tell the relationship between two variables.
  2. which one variable is increasing and other one is decreasing, which is called inversely proportional.
  3. which one variable is increasing and other one is increasing, which is called directly proportional.
In [42]:
# How to plot a line plot
sns.lineplot(data=x, x='math score', y='reading score')
Out[42]:
<AxesSubplot:xlabel='math score', ylabel='reading score'>
In [43]:
# How to plot a line plot
sns.lineplot(data=x, x='writing score', y='reading score', color  ='Green')
Out[43]:
<AxesSubplot:xlabel='writing score', ylabel='reading score'>
In [44]:
# How to plot a line plot
sns.lineplot(data=x, x='writing score', y='math score', color = 'red')
Out[44]:
<AxesSubplot:xlabel='writing score', ylabel='math score'>
In [45]:
# How to plot a line plot
sns.lineplot(data=x, x='writing score', y='math score', hue='gender', palette=['#8403fc','#fc03d3'])
Out[45]:
<AxesSubplot:xlabel='writing score', ylabel='math score'>
In [46]:
  # How to plot a line plot
sns.lineplot(data=x, x='writing score', y='math score', hue='parental level of education')
Out[46]:
<AxesSubplot:xlabel='writing score', ylabel='math score'>
In [47]:
  # How to plot a line plot
sns.lineplot(data=x, x='writing score', y='math score', hue='race/ethnicity')
Out[47]:
<AxesSubplot:xlabel='writing score', ylabel='math score'>

12- Stripplot¶

In [48]:
import seaborn as sns
sns.stripplot(data= x, x= "lunch", y= "math score", jitter=True, hue="gender")
plt.show()

13- Draw a simple liner line graph¶

  1. Set the trend line
In [49]:
sns.lmplot( data=x, x='math score',y='reading score',hue='parental level of education',row='gender', palette="Set1")
Out[49]:
<seaborn.axisgrid.FacetGrid at 0x23091d1ecd0>
In [50]:
sns.lmplot( data=x, x='math score',y='reading score',hue='parental level of education',row='gender', palette="Set1")
Out[50]:
<seaborn.axisgrid.FacetGrid at 0x23092f27c40>
In [51]:
sns.lmplot( data=x, x='math score',y='reading score',hue='race/ethnicity',row='gender', palette="Set1")
Out[51]:
<seaborn.axisgrid.FacetGrid at 0x23093bdca90>

14- Joint plot¶

In [52]:
sns.jointplot(data=x, x='gender', y='math score', kind='scatter',palette='Set1')
Out[52]:
<seaborn.axisgrid.JointGrid at 0x2309303c6a0>
In [53]:
sns.jointplot(data=x, x='lunch', y='writing score', kind='scatter',color='#34eb46')
Out[53]:
<seaborn.axisgrid.JointGrid at 0x23093bedc10>

15- species plot¶

In [54]:
sns.set_theme(style="darkgrid")
sns.kdeplot(data=x, x='math score',y='reading score')
Out[54]:
<AxesSubplot:xlabel='math score', ylabel='reading score'>
In [55]:
sns.set_theme(style="darkgrid")
sns.kdeplot(data=x, x='math score',y='writing score')
Out[55]:
<AxesSubplot:xlabel='math score', ylabel='writing score'>
In [56]:
sns.set_theme(style="darkgrid")
sns.kdeplot(data=x, x='reading score',y='writing score')
Out[56]:
<AxesSubplot:xlabel='reading score', ylabel='writing score'>
In [ ]:
 

EDA for the Data(Explanatory Data Analysis)¶

In [57]:
import pandas as pd
In [58]:
x = pd.read_csv("exams (1).csv")

1- display the last five rows¶

In [59]:
x.tail()
Out[59]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
995 male group C high school standard none 73 70 65
996 male group D associate's degree free/reduced completed 85 91 92
997 female group C some high school free/reduced none 32 35 41
998 female group C some college standard none 73 74 82
999 male group A some college standard completed 65 60 62

2- Display the first five value¶

In [60]:
x.head()
Out[60]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
0 male group A high school standard completed 67 67 63
1 female group D some high school free/reduced none 40 59 55
2 male group E some college free/reduced none 59 60 50
3 male group B high school standard none 77 78 68
4 male group E associate's degree standard completed 78 73 68

3- Random Data¶

In [61]:
x.sample(20)
Out[61]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
948 male group E some high school free/reduced completed 49 50 45
320 male group E high school free/reduced completed 58 56 49
145 female group B high school free/reduced none 51 57 57
753 male group D associate's degree free/reduced none 61 59 54
63 male group A bachelor's degree standard completed 77 82 78
985 male group E associate's degree standard none 74 73 67
944 female group C associate's degree standard completed 57 72 73
522 male group D associate's degree standard none 78 74 71
177 male group B some high school free/reduced none 55 53 51
238 male group B associate's degree standard none 76 62 64
740 male group B some college standard none 52 53 51
495 male group E some college standard none 78 65 60
147 male group E some college standard none 60 55 46
807 female group D master's degree standard none 61 72 64
959 male group D master's degree standard completed 91 84 83
142 female group D bachelor's degree standard none 86 83 87
586 female group B master's degree standard completed 59 73 74
918 male group C some high school standard none 72 68 67
569 female group B some college standard none 70 73 66
798 male group C some college standard none 56 55 50

4- random data¶

In [62]:
# show the data for 50% for your data.
x.sample(frac=0.5)
Out[62]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
577 male group C master's degree standard none 68 60 62
344 male group B high school standard completed 80 80 79
676 male group E associate's degree free/reduced none 63 61 62
691 female group D associate's degree free/reduced none 68 84 80
848 male group E high school standard none 94 89 86
... ... ... ... ... ... ... ... ...
788 male group D high school standard none 61 62 60
779 male group C associate's degree free/reduced none 38 51 47
964 male group E bachelor's degree standard none 100 83 86
927 male group C bachelor's degree free/reduced completed 61 71 65
250 female group D some college free/reduced none 53 70 71

500 rows × 8 columns

The above code shows 50% sample data.¶

In [ ]:
 

5- Basic infomation about data -EDA¶

In [63]:
 # to khow the data row and colums
In [64]:
x.shape
Out[64]:
(1000, 8)
In [65]:
# to know the data types
In [66]:
 x.info
Out[66]:
<bound method DataFrame.info of      gender race/ethnicity parental level of education         lunch  \
0      male        group A                 high school      standard   
1    female        group D            some high school  free/reduced   
2      male        group E                some college  free/reduced   
3      male        group B                 high school      standard   
4      male        group E          associate's degree      standard   
..      ...            ...                         ...           ...   
995    male        group C                 high school      standard   
996    male        group D          associate's degree  free/reduced   
997  female        group C            some high school  free/reduced   
998  female        group C                some college      standard   
999    male        group A                some college      standard   

    test preparation course  math score  reading score  writing score  
0                 completed          67             67             63  
1                      none          40             59             55  
2                      none          59             60             50  
3                      none          77             78             68  
4                 completed          78             73             68  
..                      ...         ...            ...            ...  
995                    none          73             70             65  
996               completed          85             91             92  
997                    none          32             35             41  
998                    none          73             74             82  
999               completed          65             60             62  

[1000 rows x 8 columns]>

6- Describe the data¶

In [67]:
x.describe()
Out[67]:
math score reading score writing score
count 1000.000000 1000.000000 1000.000000
mean 66.396000 69.002000 67.738000
std 15.402871 14.737272 15.600985
min 13.000000 27.000000 23.000000
25% 56.000000 60.000000 58.000000
50% 66.500000 70.000000 68.000000
75% 77.000000 79.000000 79.000000
max 100.000000 100.000000 100.000000

.Count: shows the total number of rows¶

.Mean: shows the average¶

.Std: Standard deviation value¶

.Min: Minumum value¶

.25%: First Quantile¶

.50%: Median or Secound Quantile¶

.75%: Third Quantile¶

.Max: Macimum Value¶

7- Identify if there are anu null values¶

In [68]:
x.isnull().values.any()
Out[68]:
False

8- value_counts function¶

In [69]:
#useful method if value_counts() which can get count of each category in a categorical attributed series of values.
x["math score"].value_counts()
Out[69]:
63    34
71    30
77    30
74    28
57    27
      ..
26     2
23     1
29     1
34     1
25     1
Name: math score, Length: 77, dtype: int64

9- Grouping data¶

In [70]:
#group buy is an interesting measure available means.
x.groupby(['math score','gender']).mean()
Out[70]:
reading score writing score
math score gender
13 female 32.500000 30.000000
23 female 44.000000 44.000000
25 female 36.000000 37.000000
26 female 39.000000 37.000000
28 female 41.000000 40.500000
... ... ... ...
97 male 87.000000 87.000000
98 male 89.000000 87.333333
99 male 86.000000 89.666667
100 female 100.000000 100.000000
male 89.454545 90.181818

140 rows × 2 columns

10- Duplicate values¶

In [71]:
x.duplicated().sum()
Out[71]:
1

11- Unique values in the data¶

In [72]:
x['math score'].unique()
x['reading score'].unique()
x['writing score'].unique()
Out[72]:
array([ 63,  55,  50,  68,  76,  84,  65,  45,  85,  90,  73,  57,  42,
        44,  31,  88,  54,  32,  56,  60,  89,  51,  77,  39,  71,  74,
        75,  72,  64,  82,  70,  87,  78,  49,  47,  62,  83,  48,  59,
        97,  81,  67,  69,  61,  93, 100,  53,  79,  58,  33,  86,  66,
        46,  80,  91,  92,  95,  99,  96,  28,  52,  24,  40,  43,  94,
        23,  38,  30,  35,  41,  98,  36,  27,  26,  34,  37], dtype=int64)
In [73]:
# to know the unique values in each columns.
x.nunique()
Out[73]:
gender                          2
race/ethnicity                  5
parental level of education     6
lunch                           2
test preparation course         2
math score                     77
reading score                  73
writing score                  76
dtype: int64

4- Find the Null Values¶

In [74]:
x.isnull().sum()
Out[74]:
gender                         0
race/ethnicity                 0
parental level of education    0
lunch                          0
test preparation course        0
math score                     0
reading score                  0
writing score                  0
dtype: int64

5- know the datatypes¶

In [75]:
#Datatypes
x.dtypes
Out[75]:
gender                         object
race/ethnicity                 object
parental level of education    object
lunch                          object
test preparation course        object
math score                      int64
reading score                   int64
writing score                   int64
dtype: object

6- Filter the Data¶

In [76]:
#Filter data
x[x['math score']==50].head()
Out[76]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
175 female group D some college free/reduced none 50 56 60
196 male group D bachelor's degree standard none 50 46 48
200 female group D associate's degree standard completed 50 63 65
296 male group D some high school free/reduced none 50 47 46
312 male group C associate's degree free/reduced none 50 48 43
In [77]:
x[x['reading score'] <50]
Out[77]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
9 male group C some college free/reduced none 47 42 45
16 male group B high school standard none 58 47 42
18 female group C associate's degree free/reduced none 23 44 44
19 male group C some college free/reduced none 39 32 31
24 male group E some high school free/reduced none 46 38 32
... ... ... ... ... ... ... ... ...
938 male group A high school free/reduced none 45 33 32
962 male group B some high school standard completed 46 46 44
976 female group B some college free/reduced completed 31 29 35
981 male group C some college standard none 64 48 48
997 female group C some high school free/reduced none 32 35 41

103 rows × 8 columns

7- A quick box plot¶

In [78]:
#Boxplot
x[['reading score']].boxplot()
Out[78]:
<AxesSubplot:>

8- Correlation Plot -DEA¶

In [79]:
#Show the relationship between variables
x.corr()
Out[79]:
math score reading score writing score
math score 1.000000 0.819398 0.805944
reading score 0.819398 1.000000 0.954274
writing score 0.805944 0.954274 1.000000

This is the correlation matrix with the range from +1 to -1 where +1 is highly and positively correlated and -1 will be highly negatively correlated.¶

-Correlation Plot¶

In [80]:
# you can even visualize the correlation matrix using "seabor library"

sns.heatmap(x.corr())
Out[80]:
<AxesSubplot:>

End Note of the EDA¶

EDA is the most important part of any analysis, because it summarizes the features and characteristics of the dataset. We know what exploratory data analysis is and why it’s important, how exactly does it work? In short, exploratory data analysis considers what to look for, how to look for it, and, finally, how to interpret what we discover. Exploring data with an open mind tends to reveal its underlying nature far more readily than making assumptions about the rules we think (or want) it to adhere to. In data analytics terms, we can generally say that exploratory data analysis is a qualitative investigation, not a quantitative one. We have looked at some of the basic descriptive analyses of the data using Python libraries¶